This is an example of how to access RaCA data for a given area. The RaCA (rapid carbon assessment) is a rather special dataset created by the NRCS to describe soil organic carbon across the continental United States (i.e. the “lower 48”). It is technically point data describing a pedon (the smallest unit describing soil), but most members of the public can only access this data set using “general” latitude and longitude, that is, low resolution geocoordinates with a radius of approximately 0.55 km around them. I recommend you read the first few pages of the RaCA summary to learn more about this data set if you plan to use it.
The Snake River Plain is a region across Southern Idaho with a largely common geological history, soils and climate. It is also a region of high agricultural productivity. The Snake River Plain is also underlaid by a very large aquifer, whose boundaries will be used in this tutorial for defining the area of interest for soil organic carbon (hereafter “SOC”).
The shapefile was downloaded here from State of Idaho ArcGIS hub. The area in this shapefile is just over 10 million acres. So, it’s actually a massive area, covering a very large chunk of the arable land in Southern Idaho. It is composed of an east and a west side, which are approximately 70% and 30% of the total aquifer land area, respectively. The geological history of the land is not uniform. On the eastern side is Craters of the Moon National Monument, a stunningly beautiful, yet extremely barren region due to (geologically speaking) very recent lava flows. Craters of the moon is an area of considerably different geological history, soil type, and undoubtedly, SOC status than the surrounding area. Thus this is an imperfect shape file to describe the actual target region of arable land of the Snake River Plain, but let us carry on regardless.
#### Load data
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
Reading layer `snakeaq' from data source
`/Users/juliapiaskowski/Library/CloudStorage/Dropbox/my R folder/soil_carbon_project/data/snakeaq/snakeaq.shp'
using driver `ESRI Shapefile'
Simple feature collection with 2 features and 4 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 2244453 ymin: 1235325 xmax: 2697828 ymax: 1471715
Projected CRS: NAD83 / Idaho Transverse Mercator
# the shape file defined the Eastern and Western portions, which were merged into one# the geocoordinates were transformed from metric points into latitude and longitude coordinates using a projection for the Pacific Northwest region of North American
Load the locations file. It provides a general location (only two decimals points of the geocoordinates, roughly equal to a 0.5 km radius).
raca_locs <-read.csv("data/RaCA/raca_general_location.csv") %>%rename(raca_id ='RaCA_Id', long ="Gen_long", lat ="Gen_lat") %>%distinct() # removing a duplicate row# Find locations with duplicate geocoordinates (there is one)raca_loc_dups <- raca_locs %>%group_by(raca_id) %>%mutate(count =n()) %>%ungroup() %>%filter(count >1)# Filter out duplicated locations (since we don't know which one is correct)raca_locs_filt <-anti_join(raca_locs, raca_loc_dups, by ="raca_id")
Filter data to observations with matching location data.
raca_ped_filt <-inner_join(raca_locs_filt, raca_pedon, by =join_by(raca_id)) raca_samp_filt <-inner_join(raca_locs_filt, raca_samples, by =join_by(raca_id))
Look at the all the data
us <-map_data("state")ggplot(raca_locs_filt, aes(x = long, y = lat)) +geom_polygon(data = us, aes(group = group), fill ="transparent", color ="grey50") +geom_point(alpha =0.5, col ="navy") +coord_map() +xlab("longitude") +ylab("latitude") +ggtitle("RaCA site locations") +theme_linedraw()
Hey, this looks like the map provided in the RaCA summary!
Combine data sets
Filter RaCA data set to only those observations that are within the bounds of the Snake River Aquifer shape file. This example is using the RaCA pedon file, transforming it into a sf object that is combined with the Snake River shape file.
First, turn the RaCA pedon dataset into a spatial ‘sf’ object:
raca_sf <-st_as_sf(raca_ped_filt, coords =c("long", "lat"), crs ="NAD83") # the CRS must match that of the snake river polygon# 8826 is EPGS, which works for the continental US# NAD83 works for the Pacific Northwest, where this is located
Next, filter the samples to those within the target polygon. This example is only for the ‘pedon’ file, but this and the proceding step could be repeated for the sample file.
snake_pedon <-st_filter(raca_sf, snake_poly)
There are 43 observations from RaCA that fall within this Snake River Plain shapefile. A region spanning 10 million acres has 43 data point for SOC, which is roughly 1 point for every 232,000 acres. However, these points are not evenly distributed across the landscape.
Land use
Within these samples, we can also count land use types for the locations:
table(snake_pedon$LU)
C P R W
12 8 20 3
C = cropland P = pastureland R = rangeland W = wetland
Plot the data
SOC stocks to 100 cm: The detail on how this is calculated is included in the RaCA report.
In this data set, SOC to 1 meter has many very low values and a long tail of some very high values. This is better visualised on the log scale:
par(mfrow=c(1,2))hist(raca_ped_filt$SOCstock100, main =NA, xlab ="SOC at 1 m")hist(log(raca_ped_filt$SOCstock100), main =NA, xlab ="log SOC at 1 m")
The RaCA locations we can access are approximate, and the true location may be up to 0.55 km away from its listed geocoordinates (see xkcd cartoon for more on geospatial coordinates and resolution).
Here is code to transform those points into small circular polygons. The function sf::st_buffer is used to create polygons 0.55 km in diameter around each point.
# first transform projection into meterssnake_utm <-st_transform(snake_pedon, "+proj=utm +zone=11") # Southern Idaho is technically in 2 zones (11 & 12), but a slim majority of the total land area is in zone 11# create the circular polygonsnake_utm_circles <-st_buffer(snake_utm, 550)# reproject back to NAD83snake_circles <-st_transform(snake_utm_circles, "+proj=longlat +datum=NAD83")
For plotting at the scale used previously (covering all of Southern Idaho, see above for an example), the results won’t look different, (in fact, the points will be smaller because ggplot is now plotting polygons to scale) but this will impact downstream modelling.